Intermediate Python
Lecturer: Hugo Bowne-Anderson
It is recommended that you take the course Introduction to Python prior to this course.
1 Course Description
Learning Python is crucial for any aspiring data science practitioner. Learn to visualize real data with matplotlib’s functions and get acquainted with data structures such as the dictionary and pandas DataFrame. This four-hour intermediate course will help you to build on your existing Python skills and explore new Python applications and functions that expand your repertoire and help you work more efficiently.
You’ll discover how dictionaries offer an alternative to Python lists, and why the pandas DataFrame is the most popular way of working with tabular data. In the second chapter of this course, you’ll find out how you can create and manipulate datasets, and how to access them using these structures. Hands-on practice throughout the course will build your confidence in each area.
As you progress, you’ll look at logic, control flow, filtering and loops. These functions work to control decision-making in Python programs and help you to perform more operations with your data, including repeated statements. You’ll finish the course by applying all of your new skills by using hacker statistics to calculate your chances of winning a bet.
Once you’ve completed all of the chapters, you’ll be ready to apply your new skills in your job, new career, or personal project, and be prepared to move onto more advanced Python learning.
Course materials can be found here.
2 Matplotlib
An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python’s basic data types.
2.1 Lecture: Basic Plots with Matplotlib
2.2 Line Plot
With matplotlib, you can create a bunch of different plots in Python. The most basic plot is the line plot. A general recipe is given here.
In the video, you already saw how much the world population has grown over the past years. Will it continue to do so? The world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a list called year, and the corresponding populations as a list called pop. The data can be found here.
# create year and pop
import pandas as pd
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/year_pop.csv'
df = pd.read_csv(url)
year = list(df['year'])
pop = list(df['pop'])
# Print the last item from year and pop
print(year[-1], '\n', pop[-1])## 2100
## 10.85
# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
# Make a line plot: year on the x-axis, pop on the y-axis
plt.plot(year, pop)
# Display the plot with plt.show()
plt.show()Great! Now that you’ve built your first line plot, let’s start working on the data that professor Hans Rosling used to build his beautiful bubble chart. It was collected in 2007. Two lists are available for you:
- life_exp which contains the life expectancy for each country and
- gdp_cap, which contains the GDP per capita (i.e. per person) for each country expressed in US Dollars. The data can be found here.
GDP stands for Gross Domestic Product. It basically represents the size of the economy of a country. Divide this by the population and you get the GDP per capita.
# create gdp_cap and life_exp
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/gdp_cap_life_exp.csv'
df = pd.read_csv(url)
gdp_cap = list(df.gdp_cap)
life_exp = list(df.life_exp)
# Print the last item of gdp_cap and life_exp
print(gdp_cap[-1], '\n', life_exp[-1])## 469.70929810000007
## 43.487
# Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis
plt.plot(gdp_cap, life_exp)
# Display the plot
plt.show()Well done, but this doesn’t look right. Let’s build a plot that makes more sense.
2.3 Scatter Plot
When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you’re trying to assess if there’s a correlation between two variables, for example, the scatter plot is the better choice. Below is an example of how to build a scatter plot.
Let’s continue with the gdp_cap versus life_exp plot, the GDP and life expectancy data for different countries in 2007. Maybe a scatter plot will be a better alternative?
# Change the line plot below to a scatter plot
plt.scatter(gdp_cap, life_exp)
# Put the x-axis on a logarithmic scale
plt.xscale('log')
# Show plot
plt.show()That looks much better! You see that the higher GDP usually corresponds to a higher life expectancy. In other words, there is a positive correlation.
Do you think there’s a relationship between population and life expectancy of a country? The list life_exp from the previous exercise is already available. In addition, now also pop is available, listing the corresponding populations for the countries in 2007. The populations are in millions of people. The data can be found here.
# create pop
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/pop.csv'
df = pd.read_csv(url)
pop = list(df['pop'])
# Build Scatter plot
plt.scatter(pop, life_exp)
# Show plot
plt.show()Nice! There’s no clear relationship between population and life expectancy, which makes perfect sense.
2.4 Lecture: Histogram
2.5 Build a Histogram
To see how life expectancy in different countries is distributed, let’s create a histogram of life_exp.
Great job! In the above plot, you didn’t specify the number of bins. By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won’t show you the details. Too many bins will over-complicate reality and won’t show the bigger picture.
To control the number of bins to divide your data in, you can set the bins argument.
2.6 Choose the right Plot
In the video, you saw population pyramids for the present day and for the future. Because we were using a histogram, it was very easy to make a comparison.
Let’s do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to a second list now, life_exp1950, containing similar data for 1950. The data can be found here.
# create life_exp1950
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/life_exp1950.csv'
df = pd.read_csv(url)
life_exp1950 = list(df['life_exp1950'])
# Histogram of life_exp, 15 bins
plt.hist(life_exp, bins = 15)
plt.show()2.7 Lecture: Customization
2.8 Labels
It’s time to customize your own plot. This is the fun part, you will see your plot come to life!
You’re going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life expectancy on the y-axis.
# Basic scatter plot, log scale
plt.scatter(gdp_cap, life_exp)
plt.xscale('log')
# Strings
xlab = 'GDP per Capita [in USD]'
ylab = 'Life Expectancy [in years]'
title = 'World Development in 2007'
# Add axis labels
plt.xlabel(xlab)
plt.ylabel(ylab)
# Add title
plt.title(title)
# After customizing, display the plot
plt.show()This looks much better already!
2.9 Ticks
In the video, Hugo has demonstrated how you could control the y-ticks by specifying two arguments:
In this example, the ticks corresponding to the numbers 0, 1 and 2 will be replaced by one, two and three, respectively.
Let’s do a similar thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000, 10000 and 100000 should be replaced by 1k, 10k and 100k.
# Scatter plot
plt.scatter(gdp_cap, life_exp)
# Previous customizations
plt.xscale('log')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
# Definition of tick_val and tick_lab
tick_val = [1000, 10000, 100000]
tick_lab = ['1k', '10k', '100k']
# Adapt the ticks on the x-axis
plt.xticks(tick_val, tick_lab);
# After customizing, display the plot
plt.show()Great! Your plot is shaping up nicely!
2.10 Sizes
Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let’s change this. Wouldn’t it be nice if the size of the dots corresponds to the population?
# Import numpy as np
import numpy as np
# Store pop as a numpy array: np_pop
np_pop = np.array(pop)
# Double np_pop
np_pop = np_pop * 2
# Set s argument to np_pop
plt.scatter(gdp_cap, life_exp, s = np_pop)
# Previous customizations
plt.xscale('log')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.xticks(tick_val, tick_lab);
# Display the plot
plt.show()2.11 Colors
The next step is making the plot more colorful! To do this, a list col has been given for you. It’s a list with a color for each corresponding country, depending on the continent the country is part of. The data can be found here.
# create col
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/col.csv'
df = pd.read_csv(url)
col = list(df['col'])
# Specify c and alpha inside plt.scatter()
plt.scatter(x = gdp_cap, y = life_exp,
s = np.array(pop) * 2, c = col, alpha = 0.8)
# Previous customizations
plt.xscale('log')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.xticks(tick_val, tick_lab);
# Show the plot
plt.show()Nice! This is looking more and more like Hans Rosling’s plot!
2.12 Additional Customizations
# Scatter plot
plt.scatter(x = gdp_cap, y = life_exp,
s = np.array(pop) * 2, c = col, alpha = 0.8)
# Previous customizations
plt.xscale('log')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.xticks(tick_val, tick_lab);
# Additional customizations
plt.text(1550, 71, 'India')
plt.text(5700, 80, 'China')
# Add grid() call
plt.grid()
# Show the plot
plt.show()3 Dictionaries & Pandas
Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.
3.1 Lecture: Dictionaries, Part 1
3.2 Motivation for Dictionaries
To see why dictionaries are useful, have a look at the two lists defined below. countries contains the names of some European countries. capitals lists the corresponding names of their capital.
# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
# Get index of 'germany': ind_ger
ind_ger = countries.index('germany')
# Use ind_ger to print out capital of Germany
print(capitals[ind_ger])## berlin
As Hugo already told you: this works, but it’s not very convenient.
3.3 Create Dictionaries
The countries and capitals lists are again available below. Let’s convert this data to a dictionary where the country names are the keys and the capitals are the corresponding values. As a refresher, here is a recipe for creating a dictionary:
# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']
# From string in countries and capitals, create dictionary europe
europe = {'spain':'madrid', 'france':'paris',
'germany':'berlin', 'norway':'oslo'}
# Print europe
print(europe)## {'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo'}
Great! Now that you’ve built your first dictionaries, let’s get serious!
3.4 Access Dictionaries
If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for France from europe you can use:
Here, ‘france’ is the key and ‘paris’ the value is returned.
## dict_keys(['spain', 'france', 'germany', 'norway'])
## oslo
Good job, now you’re warmed up for some more.
3.5 Lecture: Dictionaries, Part 2
3.6 Dictionary Manipulation
If you know how to access a dictionary, you can also assign a new value to it. To add a new key-value pair to europe you can use something like this:
## True
## {'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome', 'poland': 'warsaw'}
Well done! Europe is growing by the minute! Did you notice that the order of the printout is not the same as the order in the dictionary’s definition? That’s because dictionaries are inherently unordered.
Somebody thought it would be funny to mess with your accurately generated dictionary. An adapted version of the europe dictionary is available below. Let’s clean up! Do not do this by adapting the definition of europe, but by adding Python commands to update and remove key:value pairs.
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn',
'norway':'oslo', 'italy':'rome', 'poland':'warsaw',
'australia':'vienna'}
# Update capital of germany
europe['germany'] = 'berlin'
# Remove australia
del europe['australia']
# Print europe
print(europe)## {'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome', 'poland': 'warsaw'}
Great job! That’s much better!
3.7 Dictionariception
Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries.
As an example, have a look at the script below where another version of europe - the dictionary you’ve been working with all along - is coded. The keys are still the country names, but the values are dictionaries that contain more information than just the capital.
It’s perfectly possible to chain square brackets to select elements. To fetch the population for Spain from europe, for example, you need:
# Dictionary of dictionaries
europe = {'spain': {'capital':'madrid', 'population':46.77},
'france': {'capital':'paris', 'population':66.03},
'germany': {'capital':'berlin', 'population':80.62},
'norway': {'capital':'oslo', 'population':5.084}
}
# Print out the capital of France
europe['france']['capital']## 'paris'
# Create sub-dictionary data
data = {'capital':'rome', 'population':59.83}
# Add data to europe under key 'italy'
europe['italy'] = data
# Print europe
print(europe)## {'spain': {'capital': 'madrid', 'population': 46.77}, 'france': {'capital': 'paris', 'population': 66.03}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'italy': {'capital': 'rome', 'population': 59.83}}
Great! It’s time to learn about a new data structure!
3.8 Lecture: Pandas, Part 1
3.9 Dictionary to DataFrame
pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!
The DataFrame is one of Pandas’ most important data structures. It’s basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.
In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.
Three lists are defined in the script:
- names, containing the country names for which data is available.
- dr, a list with booleans that tells whether people drive left or right in the corresponding country.
- cpc, the number of motor vehicles per 1000 people in the corresponding country.
Each dictionary key is a column label and each value is a list which contains the column elements.
# Pre-defined lists
names = ['United States', 'Australia', 'Japan',
'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc}
# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)
# Print cars
print(cars)## country drives_right cars_per_cap
## 0 United States True 809
## 1 Australia False 731
## 2 Japan False 588
## 3 India False 18
## 4 Russia True 200
## 5 Morocco True 70
## 6 Egypt True 45
Good job! Notice that the columns of cars can be of different types. This was not possible with 2D NumPy arrays!
Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6? To solve this, a list row_labels has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the index attribute of cars, that you can access as cars.index.
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index = row_labels
# Print cars again
print(cars)## country drives_right cars_per_cap
## US United States True 809
## AUS Australia False 731
## JPN Japan False 588
## IN India False 18
## RU Russia True 200
## MOR Morocco True 70
## EG Egypt True 45
Nice! That looks much better already!
3.10 CSV to DataFrame
Putting data in a dictionary and then building a DataFrame works, but it’s not very efficient. What if you’re dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for “comma-separated values”.
To import CSV data into Python as a Pandas DataFrame you can use read_csv().
Let’s explore this function with the same cars data from the previous exercises. This time, however, the data is available in a CSV file, named cars.csv. The data can be found here.
# Import the cars.csv data: cars
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intermediate%20Python/cars.csv'
cars = pd.read_csv(url)
# Print out cars
print(cars)## Unnamed: 0 cars_per_cap country drives_right
## 0 US 809 United States True
## 1 AUS 731 Australia False
## 2 JPN 588 Japan False
## 3 IN 18 India False
## 4 RU 200 Russia True
## 5 MOR 70 Morocco True
## 6 EG 45 Egypt True
Nice job! Looks nice, but not exactly what we expected. Your read_csv() call to import the CSV data didn’t generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.
Remember index_col, an argument of read_csv(), that you can use to specify which column in the CSV file should be used as a row label? Well, that’s exactly what you need here!
# Fix import by including index_col
cars = pd.read_csv(url, index_col = 0)
# Print out cars
print(cars)## cars_per_cap country drives_right
## US 809 United States True
## AUS 731 Australia False
## JPN 588 Japan False
## IN 18 India False
## RU 200 Russia True
## MOR 70 Morocco True
## EG 45 Egypt True
That’s much better!
3.11 Lecture: Pandas, Part 2
3.12 Square Brackets
In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets. To select only the cars_per_cap column from cars, you can use:
The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.
## US United States
## AUS Australia
## JPN Japan
## IN India
## RU Russia
## MOR Morocco
## EG Egypt
## Name: country, dtype: object
## country
## US United States
## AUS Australia
## JPN Japan
## IN India
## RU Russia
## MOR Morocco
## EG Egypt
# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])## country drives_right
## US United States True
## AUS Australia False
## JPN Japan False
## IN India False
## RU Russia True
## MOR Morocco True
## EG Egypt True
Nice! Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:
The result is another DataFrame containing only the rows you specified. Pay attention: You can only select rows using square brackets if you specify a slice, like 0:4. Also, you’re using the integer indexes of the rows here, not the row labels!
## cars_per_cap country drives_right
## US 809 United States True
## AUS 731 Australia False
## JPN 588 Japan False
## cars_per_cap country drives_right
## IN 18 India False
## RU 200 Russia True
## MOR 70 Morocco True
You can get interesting information, but using square brackets to do indexing is rather limited. Experiment with more advanced techniques in the following exercises.
3.13 loc and iloc
With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.
Try out the following commands to experiment with loc and iloc to select observations. Each pair of commands here gives the same result.
cars.loc['RU']
cars.iloc[4]
cars.loc[['RU']]
cars.iloc[[4]]
cars.loc[['RU', 'AUS']]
cars.iloc[[4, 1]]## cars_per_cap 588
## country Japan
## drives_right False
## Name: JPN, dtype: object
## cars_per_cap country drives_right
## AUS 731 Australia False
## EG 45 Egypt True
loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands. Again, paired commands produce the same result.
cars.loc['IN', 'cars_per_cap']
cars.iloc[3, 0]
cars.loc[['IN', 'RU'], 'cars_per_cap']
cars.iloc[[3, 4], 0]
cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
cars.iloc[[3, 4], [0, 1]]## True
## country drives_right
## RU Russia True
## MOR Morocco True
It’s also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:
## US True
## AUS False
## JPN False
## IN False
## RU True
## MOR True
## EG True
## Name: drives_right, dtype: bool
## drives_right
## US True
## AUS False
## JPN False
## IN False
## RU True
## MOR True
## EG True
# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])## cars_per_cap drives_right
## US 809 True
## AUS 731 False
## JPN 588 False
## IN 18 False
## RU 200 True
## MOR 70 True
## EG 45 True
What a drill on indexing and selecting data from Pandas DataFrames! You’ve done great! It’s time to head over to Chapter 3 to learn all about logic, control flow, and filtering!
4 Logic, Control Flow & Filtering
Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You’ll also learn to filter data in pandas DataFrames using logic.
4.1 Lecture: Comparison Operators
4.2 Equality
To check if two Python values, or variables, are equal you can use \(==\). To check for inequality, you need \(!=\). As a refresher, have a look at the following examples that all result in True.
When you write these comparisons in a script, you will need to wrap a print() function around them to see the output.
## False
## True
## False
## True
The last comparison worked fine because actually, a boolean is a special kind of integer: True corresponds to 1, False corresponds to 0.
4.3 Greater and Less than
In the video, Hugo also talked about the less than and greater than signs, \(<\) and \(>\) in Python. You can combine them with an equals sign: \(<=\) and \(>=\). Pay attention: \(<=\) is valid syntax, but \(=<\) is not.
All Python expressions in the following code chunk evaluate to True:
Remember that for string comparison, Python determines the relationship based on alphabetical order.
## False
## True
## True
4.4 Compare Arrays
Out of the box, you can also use comparison operators with NumPy arrays.
Remember areas, the list of area measurements for different rooms in your house from Introduction to Python? This time there are two NumPy arrays: my_house and your_house. They both contain the areas for the kitchen, living room, bedroom and bathroom in the same order, so you can compare them.
# Create arrays
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])
# my_house greater than or equal to 18
print(my_house >= 18)## [ True True False False]
## [False True True False]
4.5 Lecture: Booleans Operators
4.6 and, or, not
A boolean is either 1 or 0, True or False. With boolean operators such as and, or and not, you can combine these booleans to perform more advanced queries on your data.
# Define variables
my_kitchen = 18.0
your_kitchen = 14.0
# my_kitchen bigger than 10 and smaller than 18?
print(my_kitchen > 10 and my_kitchen < 18)## False
## True
## True
4.7 Boolean Operators with NumPy
Before, the operational operators like \(<\) and \(>=\) worked with NumPy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not.
To use these operators with NumPy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here’s an example on the my_house and your_house arrays from before to give you an idea:
# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))## [False True False True]
# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))## [False False False True]
4.8 Lecture: if, elif, else
4.9 if
It’s time to take a closer look around in your house.
# Define variables
room = "kit"
area = 14.0
# if statement for room
if room == "kit" :
print("looking around in the kitchen.")## looking around in the kitchen.
big place! wasn’t printed, because area > 15 is not True. Experiment with other values of room and area to see how the printouts change.
4.10 Add else
# if-else construct for room
if room == "kit" :
print("looking around in the kitchen.")
else :
print("looking around elsewhere.")## looking around in the kitchen.
## pretty small.
4.11 Customizing Further: elif
It’s also possible to have a look around in the bedroom.
# Define variables
room = "bed"
area = 14.0
# if-elif-else construct for room
if room == "kit" :
print("looking around in the kitchen.")
elif room == "bed":
print("looking around in the bedroom.")
else :
print("looking around elsewhere.")## looking around in the bedroom.
# if-elif-else construct for area
if area > 15 :
print("big place!")
elif area > 10:
print('medium size, nice!')
else :
print("pretty small.")## medium size, nice!
4.12 Lecture: Filtering Pandas DataFrame
4.13 Driving Right
Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)?
In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let’s start simple and try to find all observations in cars where drives_right is True.
drives_right is a boolean column, so you’ll have to extract it as a Series and then use this boolean Series to select observations from cars.
# Extract drives_right column as Series: dr
dr = cars.loc[:, 'drives_right']
# Use dr to subset cars: sel
sel = cars[dr]
# Print sel
print(sel)## cars_per_cap country drives_right
## US 809 United States True
## RU 200 Russia True
## MOR 70 Morocco True
## EG 45 Egypt True
The code above worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable.
## cars_per_cap country drives_right
## US 809 United States True
## RU 200 Russia True
## MOR 70 Morocco True
## EG 45 Egypt True
cars contains 7 rows or observations, sel contains 4; so in the majority of the countries in your dataset, people drive on the right side of the road.
4.14 Cars per Capita
Let’s stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.
# Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars.loc[:, 'cars_per_cap']
many_cars = cpc > 500
car_maniac = cars[many_cars]
# Print car_maniac
print(car_maniac)## cars_per_cap country drives_right
## US 809 United States True
## AUS 731 Australia False
## JPN 588 Japan False
The output shows that the US, Australia and Japan have a cars_per_cap of over 500.
Remember about np.logical_and(), np.logical_or() and np.logical_not(), the NumPy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations.
Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what’s happening.
# Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap']
between = np.logical_and(cpc >= 100, cpc <= 500)
medium = cars[between]
# Print medium
print(medium)## cars_per_cap country drives_right
## RU 200 Russia True
5 Loops
There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, and for loop iterates over all kinds of data structures. Learn all about them in this chapter.
5.1 Lecture: while loop
5.2 Basic while loop
Below you can find the example from the video where the error variable, initially equal to 50.0, is divided by 4 and printed out on every run:
This example will come in handy, because it’s time to build a while loop yourself! We’re going to code a while loop that implements a very basic control system for an inverted pendulum. If there’s an offset from standing perfectly straight, the while loop will incrementally fix this offset.
Note that if your while loop takes too long to run, you might have made a mistake. In particular, remember to indent the contents of the loop using four spaces or auto-indentation!
## correcting...
## 7
## correcting...
## 6
## correcting...
## 5
## correcting...
## 4
## correcting...
## 3
## correcting...
## 2
## correcting...
## 1
## correcting...
## 0
5.3 Add conditionals
The while loop that corrects the offset is a good start, but what if offset is negative? You can try to run the following code where offset is initialized to -6:
The while loop will never stop running, because offset will be further decreased on every run. offset != 0 will never become False and the while loop continues forever. Fix things by putting an if-else statement inside the while loop. If your code is still taking too long to run, you probably made a mistake!
offset = -6
while offset != 0 :
print("correcting...")
if offset > 0 :
offset = offset - 1
else :
offset = offset + 1
print(offset)## correcting...
## -5
## correcting...
## -4
## correcting...
## -3
## correcting...
## -2
## correcting...
## -1
## correcting...
## 0
The while loop is not that often used in Data Science, so let’s head over to the for loop.
5.4 Lecture: for loop
5.5 Loop over a list
Have another look at the for loop that Hugo showed in the video:
As usual, you simply have to indent the code with 4 spaces to tell Python which code should be executed in the for loop.
## 11.25
## 18.0
## 20.0
## 10.75
## 9.5
5.6 Indexes and values
Using a for loop to iterate over a list only gives you access to every list element in each run, one after the other. If you also want to access the index information, so where the list element you’re iterating over is located, you can use enumerate().
As an example, have a look at how the for loop from the video was converted:
fam = [1.73, 1.68, 1.71, 1.89]
for index, height in enumerate(fam) :
print("person " + str(index) + ": " + str(height))# Change for loop to use enumerate() and update print()
for i, a in enumerate(areas):
print('room ' + str(i) + ': ' + str(a))## room 0: 11.25
## room 1: 18.0
## room 2: 20.0
## room 3: 10.75
## room 4: 9.5
For non-programmer folks, room 0: 11.25 is strange. Wouldn’t it be better if the count started at 1?
## room 1: 11.25
## room 2: 18.0
## room 3: 20.0
## room 4: 10.75
## room 5: 9.5
5.7 Loop over list of lists
Remember the house variable from the Introduction to Python course? Have a look at its definition below. It’s basically a list of lists, where each sublist contains the name and area of a room in your house.
house = [["hallway", 11.25],
["kitchen", 18.0],
["living room", 20.0],
["bedroom", 10.75],
["bathroom", 9.50]]
for a in house:
print('the ' + str(a[0]) + ' is ' + str(a[1]) + ' sqm')## the hallway is 11.25 sqm
## the kitchen is 18.0 sqm
## the living room is 20.0 sqm
## the bedroom is 10.75 sqm
## the bathroom is 9.5 sqm
5.8 Lecture: Loop Data Structures, Part 1
5.9 Loop over dictionary
In Python 3, you need the items() method to loop over a dictionary:
world = { "afghanistan":30.55,
"albania":2.77,
"algeria":39.21 }
for key, value in world.items() :
print(key + " -- " + str(value))Remember the europe dictionary that contained the names of some European countries as key and their capitals as corresponding value? Let’s write a loop to iterate over it!
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
# Iterate over europe
for key, value in europe.items():
print('the capital of ' + str(key) + ' is ' + str(value))## the capital of spain is madrid
## the capital of france is paris
## the capital of germany is berlin
## the capital of norway is oslo
## the capital of italy is rome
## the capital of poland is warsaw
## the capital of austria is vienna
5.10 Loop over NumPy array
If you’re dealing with a 1D NumPy array, looping over all elements can be as simple as:
If you’re dealing with a 2D NumPy array, it’s more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you’ll need this syntax:
Two NumPy arrays that you might recognize from the intro course are available: np_height, a NumPy array containing the heights of Major League Baseball players, and np_baseball, a 2D NumPy array that contains both the heights (first column) and weights (second column) of those players.
# create height_in and weight_lb
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Python/baseball.csv'
df = pd.read_csv(url)
height_in = list(df.Height)
weight_lb = list(df.Weight)
baseball = [list(i) for i in list(zip(height_in, weight_lb))]
np_height = np.array(height_in)
np_baseball = np.array(baseball)
# For loop over np_height
for x in np_height:
print(str(x) + ' inches', end = '; ')## 74 inches; 74 inches; 72 inches; 72 inches; 73 inches; 69 inches; 69 inches; 71 inches; 76 inches; 71 inches; 73 inches; 73 inches; 74 inches; 74 inches; 69 inches; 70 inches; 73 inches; 75 inches; 78 inches; 79 inches; 76 inches; 74 inches; 76 inches; 72 inches; 71 inches; 75 inches; 77 inches; 74 inches; 73 inches; 74 inches; 78 inches; 73 inches; 75 inches; 73 inches; 75 inches; 75 inches; 74 inches; 69 inches; 71 inches; 74 inches; 73 inches; 73 inches; 76 inches; 74 inches; 74 inches; 70 inches; 72 inches; 77 inches; 74 inches; 70 inches; 73 inches; 75 inches; 76 inches; 76 inches; 78 inches; 74 inches; 74 inches; 76 inches; 77 inches; 81 inches; 78 inches; 75 inches; 77 inches; 75 inches; 76 inches; 74 inches; 72 inches; 72 inches; 75 inches; 73 inches; 73 inches; 73 inches; 70 inches; 70 inches; 70 inches; 76 inches; 68 inches; 71 inches; 72 inches; 75 inches; 75 inches; 75 inches; 75 inches; 68 inches; 74 inches; 78 inches; 71 inches; 73 inches; 76 inches; 74 inches; 74 inches; 79 inches; 75 inches; 73 inches; 76 inches; 74 inches; 74 inches; 73 inches; 72 inches; 74 inches; 73 inches; 74 inches; 72 inches; 73 inches; 69 inches; 72 inches; 73 inches; 75 inches; 75 inches; 73 inches; 72 inches; 72 inches; 76 inches; 74 inches; 72 inches; 77 inches; 74 inches; 77 inches; 75 inches; 76 inches; 80 inches; 74 inches; 74 inches; 75 inches; 78 inches; 73 inches; 73 inches; 74 inches; 75 inches; 76 inches; 71 inches; 73 inches; 74 inches; 76 inches; 76 inches; 74 inches; 73 inches; 74 inches; 70 inches; 72 inches; 73 inches; 73 inches; 73 inches; 73 inches; 71 inches; 74 inches; 74 inches; 72 inches; 74 inches; 71 inches; 74 inches; 73 inches; 75 inches; 75 inches; 79 inches; 73 inches; 75 inches; 76 inches; 74 inches; 76 inches; 78 inches; 74 inches; 76 inches; 72 inches; 74 inches; 76 inches; 74 inches; 75 inches; 78 inches; 75 inches; 72 inches; 74 inches; 72 inches; 74 inches; 70 inches; 71 inches; 70 inches; 75 inches; 71 inches; 71 inches; 73 inches; 72 inches; 71 inches; 73 inches; 72 inches; 75 inches; 74 inches; 74 inches; 75 inches; 73 inches; 77 inches; 73 inches; 76 inches; 75 inches; 74 inches; 76 inches; 75 inches; 73 inches; 71 inches; 76 inches; 75 inches; 72 inches; 71 inches; 77 inches; 73 inches; 74 inches; 71 inches; 72 inches; 74 inches; 75 inches; 73 inches; 72 inches; 75 inches; 75 inches; 74 inches; 72 inches; 74 inches; 71 inches; 70 inches; 74 inches; 77 inches; 77 inches; 75 inches; 75 inches; 78 inches; 75 inches; 76 inches; 73 inches; 75 inches; 75 inches; 79 inches; 77 inches; 76 inches; 71 inches; 75 inches; 74 inches; 69 inches; 71 inches; 76 inches; 72 inches; 72 inches; 70 inches; 72 inches; 73 inches; 71 inches; 72 inches; 71 inches; 73 inches; 72 inches; 73 inches; 74 inches; 74 inches; 72 inches; 75 inches; 74 inches; 74 inches; 77 inches; 75 inches; 73 inches; 72 inches; 71 inches; 74 inches; 77 inches; 75 inches; 75 inches; 75 inches; 78 inches; 78 inches; 74 inches; 76 inches; 78 inches; 76 inches; 70 inches; 72 inches; 80 inches; 74 inches; 74 inches; 71 inches; 70 inches; 72 inches; 71 inches; 74 inches; 71 inches; 72 inches; 71 inches; 74 inches; 69 inches; 76 inches; 75 inches; 75 inches; 76 inches; 73 inches; 76 inches; 73 inches; 77 inches; 73 inches; 72 inches; 72 inches; 77 inches; 77 inches; 71 inches; 74 inches; 74 inches; 73 inches; 78 inches; 75 inches; 73 inches; 70 inches; 74 inches; 72 inches; 73 inches; 73 inches; 75 inches; 75 inches; 74 inches; 76 inches; 73 inches; 74 inches; 75 inches; 75 inches; 72 inches; 73 inches; 73 inches; 72 inches; 74 inches; 78 inches; 76 inches; 73 inches; 74 inches; 75 inches; 70 inches; 75 inches; 71 inches; 72 inches; 78 inches; 75 inches; 73 inches; 73 inches; 71 inches; 75 inches; 77 inches; 72 inches; 69 inches; 73 inches; 74 inches; 72 inches; 70 inches; 75 inches; 70 inches; 72 inches; 72 inches; 74 inches; 73 inches; 74 inches; 76 inches; 75 inches; 80 inches; 72 inches; 75 inches; 73 inches; 74 inches; 74 inches; 73 inches; 75 inches; 75 inches; 71 inches; 73 inches; 75 inches; 74 inches; 74 inches; 72 inches; 74 inches; 74 inches; 74 inches; 73 inches; 76 inches; 75 inches; 72 inches; 73 inches; 73 inches; 73 inches; 72 inches; 72 inches; 72 inches; 72 inches; 71 inches; 75 inches; 75 inches; 74 inches; 73 inches; 75 inches; 79 inches; 74 inches; 76 inches; 73 inches; 74 inches; 74 inches; 72 inches; 74 inches; 74 inches; 75 inches; 78 inches; 74 inches; 74 inches; 74 inches; 77 inches; 70 inches; 73 inches; 74 inches; 73 inches; 71 inches; 75 inches; 71 inches; 72 inches; 77 inches; 74 inches; 70 inches; 77 inches; 73 inches; 72 inches; 76 inches; 71 inches; 76 inches; 78 inches; 75 inches; 73 inches; 78 inches; 74 inches; 79 inches; 75 inches; 76 inches; 72 inches; 75 inches; 75 inches; 70 inches; 72 inches; 70 inches; 74 inches; 71 inches; 76 inches; 73 inches; 76 inches; 71 inches; 69 inches; 72 inches; 72 inches; 69 inches; 73 inches; 69 inches; 73 inches; 74 inches; 74 inches; 72 inches; 71 inches; 72 inches; 72 inches; 76 inches; 76 inches; 76 inches; 74 inches; 76 inches; 75 inches; 71 inches; 72 inches; 71 inches; 73 inches; 75 inches; 76 inches; 75 inches; 71 inches; 75 inches; 74 inches; 72 inches; 73 inches; 73 inches; 73 inches; 73 inches; 76 inches; 72 inches; 76 inches; 73 inches; 73 inches; 73 inches; 75 inches; 75 inches; 77 inches; 73 inches; 72 inches; 75 inches; 70 inches; 74 inches; 72 inches; 80 inches; 71 inches; 71 inches; 74 inches; 74 inches; 73 inches; 75 inches; 76 inches; 73 inches; 77 inches; 72 inches; 73 inches; 77 inches; 76 inches; 71 inches; 75 inches; 73 inches; 74 inches; 77 inches; 71 inches; 72 inches; 73 inches; 69 inches; 73 inches; 70 inches; 74 inches; 76 inches; 73 inches; 73 inches; 75 inches; 73 inches; 79 inches; 74 inches; 73 inches; 74 inches; 77 inches; 75 inches; 74 inches; 73 inches; 77 inches; 73 inches; 77 inches; 74 inches; 74 inches; 73 inches; 77 inches; 74 inches; 77 inches; 75 inches; 77 inches; 75 inches; 71 inches; 74 inches; 70 inches; 79 inches; 72 inches; 72 inches; 70 inches; 74 inches; 74 inches; 72 inches; 73 inches; 72 inches; 74 inches; 74 inches; 76 inches; 82 inches; 74 inches; 74 inches; 70 inches; 73 inches; 73 inches; 74 inches; 77 inches; 72 inches; 76 inches; 73 inches; 73 inches; 72 inches; 74 inches; 74 inches; 71 inches; 72 inches; 75 inches; 74 inches; 74 inches; 77 inches; 70 inches; 71 inches; 73 inches; 76 inches; 71 inches; 75 inches; 74 inches; 72 inches; 76 inches; 79 inches; 76 inches; 73 inches; 76 inches; 78 inches; 75 inches; 76 inches; 72 inches; 72 inches; 73 inches; 73 inches; 75 inches; 71 inches; 76 inches; 70 inches; 75 inches; 74 inches; 75 inches; 73 inches; 71 inches; 71 inches; 72 inches; 73 inches; 73 inches; 72 inches; 69 inches; 73 inches; 78 inches; 71 inches; 73 inches; 75 inches; 76 inches; 70 inches; 74 inches; 77 inches; 75 inches; 79 inches; 72 inches; 77 inches; 73 inches; 75 inches; 75 inches; 75 inches; 73 inches; 73 inches; 76 inches; 77 inches; 75 inches; 70 inches; 71 inches; 71 inches; 75 inches; 74 inches; 69 inches; 70 inches; 75 inches; 72 inches; 75 inches; 73 inches; 72 inches; 72 inches; 72 inches; 76 inches; 75 inches; 74 inches; 69 inches; 73 inches; 72 inches; 72 inches; 75 inches; 77 inches; 76 inches; 80 inches; 77 inches; 76 inches; 79 inches; 71 inches; 75 inches; 73 inches; 76 inches; 77 inches; 73 inches; 76 inches; 70 inches; 75 inches; 73 inches; 75 inches; 70 inches; 69 inches; 71 inches; 72 inches; 72 inches; 73 inches; 70 inches; 70 inches; 73 inches; 76 inches; 75 inches; 72 inches; 73 inches; 79 inches; 71 inches; 72 inches; 74 inches; 74 inches; 74 inches; 72 inches; 76 inches; 76 inches; 72 inches; 72 inches; 71 inches; 72 inches; 72 inches; 70 inches; 77 inches; 74 inches; 72 inches; 76 inches; 71 inches; 76 inches; 71 inches; 73 inches; 70 inches; 73 inches; 73 inches; 72 inches; 71 inches; 71 inches; 71 inches; 72 inches; 72 inches; 74 inches; 74 inches; 74 inches; 71 inches; 72 inches; 75 inches; 72 inches; 71 inches; 72 inches; 72 inches; 72 inches; 72 inches; 74 inches; 74 inches; 77 inches; 75 inches; 73 inches; 75 inches; 73 inches; 76 inches; 72 inches; 77 inches; 75 inches; 72 inches; 71 inches; 71 inches; 75 inches; 72 inches; 73 inches; 73 inches; 71 inches; 70 inches; 75 inches; 71 inches; 76 inches; 73 inches; 68 inches; 71 inches; 72 inches; 74 inches; 77 inches; 72 inches; 76 inches; 78 inches; 81 inches; 72 inches; 73 inches; 76 inches; 72 inches; 72 inches; 74 inches; 76 inches; 73 inches; 76 inches; 75 inches; 70 inches; 71 inches; 74 inches; 72 inches; 73 inches; 76 inches; 76 inches; 73 inches; 71 inches; 68 inches; 71 inches; 71 inches; 74 inches; 77 inches; 69 inches; 72 inches; 76 inches; 75 inches; 76 inches; 75 inches; 76 inches; 72 inches; 74 inches; 76 inches; 74 inches; 72 inches; 75 inches; 78 inches; 77 inches; 70 inches; 72 inches; 79 inches; 74 inches; 71 inches; 68 inches; 77 inches; 75 inches; 71 inches; 72 inches; 70 inches; 72 inches; 72 inches; 73 inches; 72 inches; 74 inches; 72 inches; 72 inches; 75 inches; 72 inches; 73 inches; 74 inches; 72 inches; 78 inches; 75 inches; 72 inches; 74 inches; 75 inches; 75 inches; 76 inches; 74 inches; 74 inches; 73 inches; 74 inches; 71 inches; 74 inches; 75 inches; 76 inches; 74 inches; 76 inches; 76 inches; 73 inches; 75 inches; 75 inches; 74 inches; 68 inches; 72 inches; 75 inches; 71 inches; 70 inches; 72 inches; 73 inches; 72 inches; 75 inches; 74 inches; 70 inches; 76 inches; 71 inches; 82 inches; 72 inches; 73 inches; 74 inches; 71 inches; 75 inches; 77 inches; 72 inches; 74 inches; 72 inches; 73 inches; 78 inches; 77 inches; 73 inches; 73 inches; 73 inches; 73 inches; 73 inches; 76 inches; 75 inches; 70 inches; 73 inches; 72 inches; 73 inches; 75 inches; 74 inches; 73 inches; 73 inches; 76 inches; 73 inches; 75 inches; 70 inches; 77 inches; 72 inches; 77 inches; 74 inches; 75 inches; 75 inches; 75 inches; 75 inches; 72 inches; 74 inches; 71 inches; 76 inches; 71 inches; 75 inches; 76 inches; 83 inches; 75 inches; 74 inches; 76 inches; 72 inches; 72 inches; 75 inches; 75 inches; 72 inches; 77 inches; 73 inches; 72 inches; 70 inches; 74 inches; 72 inches; 74 inches; 72 inches; 71 inches; 70 inches; 71 inches; 76 inches; 74 inches; 76 inches; 74 inches; 74 inches; 74 inches; 75 inches; 75 inches; 71 inches; 71 inches; 74 inches; 77 inches; 71 inches; 74 inches; 75 inches; 77 inches; 76 inches; 74 inches; 76 inches; 72 inches; 71 inches; 72 inches; 75 inches; 73 inches; 68 inches; 72 inches; 69 inches; 73 inches; 73 inches; 75 inches; 70 inches; 70 inches; 74 inches; 75 inches; 74 inches; 74 inches; 73 inches; 74 inches; 75 inches; 77 inches; 73 inches; 74 inches; 76 inches; 74 inches; 75 inches; 73 inches; 76 inches; 78 inches; 75 inches; 73 inches; 77 inches; 74 inches; 72 inches; 74 inches; 72 inches; 71 inches; 73 inches; 75 inches; 73 inches; 67 inches; 67 inches; 76 inches; 74 inches; 73 inches; 70 inches; 75 inches; 70 inches; 72 inches; 77 inches; 79 inches; 78 inches; 74 inches; 75 inches; 75 inches; 78 inches; 76 inches; 75 inches; 69 inches; 75 inches; 72 inches; 75 inches; 73 inches; 74 inches; 75 inches; 75 inches; 73 inches;
## 74; 180; 74; 215; 72; 210; 72; 210; 73; 188; 69; 176; 69; 209; 71; 200; 76; 231; 71; 180; 73; 188; 73; 180; 74; 185; 74; 160; 69; 180; 70; 185; 73; 189; 75; 185; 78; 219; 79; 230; 76; 205; 74; 230; 76; 195; 72; 180; 71; 192; 75; 225; 77; 203; 74; 195; 73; 182; 74; 188; 78; 200; 73; 180; 75; 200; 73; 200; 75; 245; 75; 240; 74; 215; 69; 185; 71; 175; 74; 199; 73; 200; 73; 215; 76; 200; 74; 205; 74; 206; 70; 186; 72; 188; 77; 220; 74; 210; 70; 195; 73; 200; 75; 200; 76; 212; 76; 224; 78; 210; 74; 205; 74; 220; 76; 195; 77; 200; 81; 260; 78; 228; 75; 270; 77; 200; 75; 210; 76; 190; 74; 220; 72; 180; 72; 205; 75; 210; 73; 220; 73; 211; 73; 200; 70; 180; 70; 190; 70; 170; 76; 230; 68; 155; 71; 185; 72; 185; 75; 200; 75; 225; 75; 225; 75; 220; 68; 160; 74; 205; 78; 235; 71; 250; 73; 210; 76; 190; 74; 160; 74; 200; 79; 205; 75; 222; 73; 195; 76; 205; 74; 220; 74; 220; 73; 170; 72; 185; 74; 195; 73; 220; 74; 230; 72; 180; 73; 220; 69; 180; 72; 180; 73; 170; 75; 210; 75; 215; 73; 200; 72; 213; 72; 180; 76; 192; 74; 235; 72; 185; 77; 235; 74; 210; 77; 222; 75; 210; 76; 230; 80; 220; 74; 180; 74; 190; 75; 200; 78; 210; 73; 194; 73; 180; 74; 190; 75; 240; 76; 200; 71; 198; 73; 200; 74; 195; 76; 210; 76; 220; 74; 190; 73; 210; 74; 225; 70; 180; 72; 185; 73; 170; 73; 185; 73; 185; 73; 180; 71; 178; 74; 175; 74; 200; 72; 204; 74; 211; 71; 190; 74; 210; 73; 190; 75; 190; 75; 185; 79; 290; 73; 175; 75; 185; 76; 200; 74; 220; 76; 170; 78; 220; 74; 190; 76; 220; 72; 205; 74; 200; 76; 250; 74; 225; 75; 215; 78; 210; 75; 215; 72; 195; 74; 200; 72; 194; 74; 220; 70; 180; 71; 180; 70; 170; 75; 195; 71; 180; 71; 170; 73; 206; 72; 205; 71; 200; 73; 225; 72; 201; 75; 225; 74; 233; 74; 180; 75; 225; 73; 180; 77; 220; 73; 180; 76; 237; 75; 215; 74; 190; 76; 235; 75; 190; 73; 180; 71; 165; 76; 195; 75; 200; 72; 190; 71; 190; 77; 185; 73; 185; 74; 205; 71; 190; 72; 205; 74; 206; 75; 220; 73; 208; 72; 170; 75; 195; 75; 210; 74; 190; 72; 211; 74; 230; 71; 170; 70; 185; 74; 185; 77; 241; 77; 225; 75; 210; 75; 175; 78; 230; 75; 200; 76; 215; 73; 198; 75; 226; 75; 278; 79; 215; 77; 230; 76; 240; 71; 184; 75; 219; 74; 170; 69; 218; 71; 190; 76; 225; 72; 220; 72; 176; 70; 190; 72; 197; 73; 204; 71; 167; 72; 180; 71; 195; 73; 220; 72; 215; 73; 185; 74; 190; 74; 205; 72; 205; 75; 200; 74; 210; 74; 215; 77; 200; 75; 205; 73; 211; 72; 190; 71; 208; 74; 200; 77; 210; 75; 232; 75; 230; 75; 210; 78; 220; 78; 210; 74; 202; 76; 212; 78; 225; 76; 170; 70; 190; 72; 200; 80; 237; 74; 220; 74; 170; 71; 193; 70; 190; 72; 150; 71; 220; 74; 200; 71; 190; 72; 185; 71; 185; 74; 200; 69; 172; 76; 220; 75; 225; 75; 190; 76; 195; 73; 219; 76; 190; 73; 197; 77; 200; 73; 195; 72; 210; 72; 177; 77; 220; 77; 235; 71; 180; 74; 195; 74; 195; 73; 190; 78; 230; 75; 190; 73; 200; 70; 190; 74; 190; 72; 200; 73; 200; 73; 184; 75; 200; 75; 180; 74; 219; 76; 187; 73; 200; 74; 220; 75; 205; 75; 190; 72; 170; 73; 160; 73; 215; 72; 175; 74; 205; 78; 200; 76; 214; 73; 200; 74; 190; 75; 180; 70; 205; 75; 220; 71; 190; 72; 215; 78; 235; 75; 191; 73; 200; 73; 181; 71; 200; 75; 210; 77; 240; 72; 185; 69; 165; 73; 190; 74; 185; 72; 175; 70; 155; 75; 210; 70; 170; 72; 175; 72; 220; 74; 210; 73; 205; 74; 200; 76; 205; 75; 195; 80; 240; 72; 150; 75; 200; 73; 215; 74; 202; 74; 200; 73; 190; 75; 205; 75; 190; 71; 160; 73; 215; 75; 185; 74; 200; 74; 190; 72; 210; 74; 185; 74; 220; 74; 190; 73; 202; 76; 205; 75; 220; 72; 175; 73; 160; 73; 190; 73; 200; 72; 229; 72; 206; 72; 220; 72; 180; 71; 195; 75; 175; 75; 188; 74; 230; 73; 190; 75; 200; 79; 190; 74; 219; 76; 235; 73; 180; 74; 180; 74; 180; 72; 200; 74; 234; 74; 185; 75; 220; 78; 223; 74; 200; 74; 210; 74; 200; 77; 210; 70; 190; 73; 177; 74; 227; 73; 180; 71; 195; 75; 199; 71; 175; 72; 185; 77; 240; 74; 210; 70; 180; 77; 194; 73; 225; 72; 180; 76; 205; 71; 193; 76; 230; 78; 230; 75; 220; 73; 200; 78; 249; 74; 190; 79; 208; 75; 245; 76; 250; 72; 160; 75; 192; 75; 220; 70; 170; 72; 197; 70; 155; 74; 190; 71; 200; 76; 220; 73; 210; 76; 228; 71; 190; 69; 160; 72; 184; 72; 180; 69; 180; 73; 200; 69; 176; 73; 160; 74; 222; 74; 211; 72; 195; 71; 200; 72; 175; 72; 206; 76; 240; 76; 185; 76; 260; 74; 185; 76; 221; 75; 205; 71; 200; 72; 170; 71; 201; 73; 205; 75; 185; 76; 205; 75; 245; 71; 220; 75; 210; 74; 220; 72; 185; 73; 175; 73; 170; 73; 180; 73; 200; 76; 210; 72; 175; 76; 220; 73; 206; 73; 180; 73; 210; 75; 195; 75; 200; 77; 200; 73; 164; 72; 180; 75; 220; 70; 195; 74; 205; 72; 170; 80; 240; 71; 210; 71; 195; 74; 200; 74; 205; 73; 192; 75; 190; 76; 170; 73; 240; 77; 200; 72; 205; 73; 175; 77; 250; 76; 220; 71; 224; 75; 210; 73; 195; 74; 180; 77; 245; 71; 175; 72; 180; 73; 215; 69; 175; 73; 180; 70; 195; 74; 230; 76; 230; 73; 205; 73; 215; 75; 195; 73; 180; 79; 205; 74; 180; 73; 190; 74; 180; 77; 190; 75; 190; 74; 220; 73; 210; 77; 255; 73; 190; 77; 230; 74; 200; 74; 205; 73; 210; 77; 225; 74; 215; 77; 220; 75; 205; 77; 200; 75; 220; 71; 197; 74; 225; 70; 187; 79; 245; 72; 185; 72; 185; 70; 175; 74; 200; 74; 180; 72; 188; 73; 225; 72; 200; 74; 210; 74; 245; 76; 213; 82; 231; 74; 165; 74; 228; 70; 210; 73; 250; 73; 191; 74; 190; 77; 200; 72; 215; 76; 254; 73; 232; 73; 180; 72; 215; 74; 220; 74; 180; 71; 200; 72; 170; 75; 195; 74; 210; 74; 200; 77; 220; 70; 165; 71; 180; 73; 200; 76; 200; 71; 170; 75; 224; 74; 220; 72; 180; 76; 198; 79; 240; 76; 239; 73; 185; 76; 210; 78; 220; 75; 200; 76; 195; 72; 220; 72; 230; 73; 170; 73; 220; 75; 230; 71; 165; 76; 205; 70; 192; 75; 210; 74; 205; 75; 200; 73; 210; 71; 185; 71; 195; 72; 202; 73; 205; 73; 195; 72; 180; 69; 200; 73; 185; 78; 240; 71; 185; 73; 220; 75; 205; 76; 205; 70; 180; 74; 201; 77; 190; 75; 208; 79; 240; 72; 180; 77; 230; 73; 195; 75; 215; 75; 190; 75; 195; 73; 215; 73; 215; 76; 220; 77; 220; 75; 230; 70; 195; 71; 190; 71; 195; 75; 209; 74; 204; 69; 170; 70; 185; 75; 205; 72; 175; 75; 210; 73; 190; 72; 180; 72; 180; 72; 160; 76; 235; 75; 200; 74; 210; 69; 180; 73; 190; 72; 197; 72; 203; 75; 205; 77; 170; 76; 200; 80; 250; 77; 200; 76; 220; 79; 200; 71; 190; 75; 170; 73; 190; 76; 220; 77; 215; 73; 206; 76; 215; 70; 185; 75; 235; 73; 188; 75; 230; 70; 195; 69; 168; 71; 190; 72; 160; 72; 200; 73; 200; 70; 189; 70; 180; 73; 190; 76; 200; 75; 220; 72; 187; 73; 240; 79; 190; 71; 180; 72; 185; 74; 210; 74; 220; 74; 219; 72; 190; 76; 193; 76; 175; 72; 180; 72; 215; 71; 210; 72; 200; 72; 190; 70; 185; 77; 220; 74; 170; 72; 195; 76; 205; 71; 195; 76; 210; 71; 190; 73; 190; 70; 180; 73; 220; 73; 190; 72; 186; 71; 185; 71; 190; 71; 180; 72; 190; 72; 170; 74; 210; 74; 240; 74; 220; 71; 180; 72; 210; 75; 210; 72; 195; 71; 160; 72; 180; 72; 205; 72; 200; 72; 185; 74; 245; 74; 190; 77; 210; 75; 200; 73; 200; 75; 222; 73; 215; 76; 240; 72; 170; 77; 220; 75; 156; 72; 190; 71; 202; 71; 221; 75; 200; 72; 190; 73; 210; 73; 190; 71; 200; 70; 165; 75; 190; 71; 185; 76; 230; 73; 208; 68; 209; 71; 175; 72; 180; 74; 200; 77; 205; 72; 200; 76; 250; 78; 210; 81; 230; 72; 244; 73; 202; 76; 240; 72; 200; 72; 215; 74; 177; 76; 210; 73; 170; 76; 215; 75; 217; 70; 198; 71; 200; 74; 220; 72; 170; 73; 200; 76; 230; 76; 231; 73; 183; 71; 192; 68; 167; 71; 190; 71; 180; 74; 180; 77; 215; 69; 160; 72; 205; 76; 223; 75; 175; 76; 170; 75; 190; 76; 240; 72; 175; 74; 230; 76; 223; 74; 196; 72; 167; 75; 195; 78; 190; 77; 250; 70; 190; 72; 190; 79; 190; 74; 170; 71; 160; 68; 150; 77; 225; 75; 220; 71; 209; 72; 210; 70; 176; 72; 260; 72; 195; 73; 190; 72; 184; 74; 180; 72; 195; 72; 195; 75; 219; 72; 225; 73; 212; 74; 202; 72; 185; 78; 200; 75; 209; 72; 200; 74; 195; 75; 228; 75; 210; 76; 190; 74; 212; 74; 190; 73; 218; 74; 220; 71; 190; 74; 235; 75; 210; 76; 200; 74; 188; 76; 210; 76; 235; 73; 188; 75; 215; 75; 216; 74; 220; 68; 180; 72; 185; 75; 200; 71; 210; 70; 220; 72; 185; 73; 231; 72; 210; 75; 195; 74; 200; 70; 205; 76; 200; 71; 190; 82; 250; 72; 185; 73; 180; 74; 170; 71; 180; 75; 208; 77; 235; 72; 215; 74; 244; 72; 220; 73; 185; 78; 230; 77; 190; 73; 200; 73; 180; 73; 190; 73; 196; 73; 180; 76; 230; 75; 224; 70; 160; 73; 178; 72; 205; 73; 185; 75; 210; 74; 180; 73; 190; 73; 200; 76; 257; 73; 190; 75; 220; 70; 165; 77; 205; 72; 200; 77; 208; 74; 185; 75; 215; 75; 170; 75; 235; 75; 210; 72; 170; 74; 180; 71; 170; 76; 190; 71; 150; 75; 230; 76; 203; 83; 260; 75; 246; 74; 186; 76; 210; 72; 198; 72; 210; 75; 215; 75; 180; 72; 200; 77; 245; 73; 200; 72; 192; 70; 192; 74; 200; 72; 192; 74; 205; 72; 190; 71; 186; 70; 170; 71; 197; 76; 219; 74; 200; 76; 220; 74; 207; 74; 225; 74; 207; 75; 212; 75; 225; 71; 170; 71; 190; 74; 210; 77; 230; 71; 210; 74; 200; 75; 238; 77; 234; 76; 222; 74; 200; 76; 190; 72; 170; 71; 220; 72; 223; 75; 210; 73; 215; 68; 196; 72; 175; 69; 175; 73; 189; 73; 205; 75; 210; 70; 180; 70; 180; 74; 197; 75; 220; 74; 228; 74; 190; 73; 204; 74; 165; 75; 216; 77; 220; 73; 208; 74; 210; 76; 215; 74; 195; 75; 200; 73; 215; 76; 229; 78; 240; 75; 207; 73; 205; 77; 208; 74; 185; 72; 190; 74; 170; 72; 208; 71; 225; 73; 190; 75; 225; 73; 185; 67; 180; 67; 165; 76; 240; 74; 220; 73; 212; 70; 163; 75; 215; 70; 175; 72; 205; 77; 210; 79; 205; 78; 208; 74; 215; 75; 180; 75; 200; 78; 230; 76; 211; 75; 230; 69; 190; 75; 220; 72; 180; 75; 205; 73; 190; 74; 180; 75; 205; 75; 190; 73; 195;
5.11 Lecture: Loop Data Structures, Part 2
5.12 Loop over DataFrame
Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available:
In this exercise you will be working on the cars DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world.
## US
## cars_per_cap 809
## country United States
## drives_right True
## Name: US, dtype: object
## AUS
## cars_per_cap 731
## country Australia
## drives_right False
## Name: AUS, dtype: object
## JPN
## cars_per_cap 588
## country Japan
## drives_right False
## Name: JPN, dtype: object
## IN
## cars_per_cap 18
## country India
## drives_right False
## Name: IN, dtype: object
## RU
## cars_per_cap 200
## country Russia
## drives_right True
## Name: RU, dtype: object
## MOR
## cars_per_cap 70
## country Morocco
## drives_right True
## Name: MOR, dtype: object
## EG
## cars_per_cap 45
## country Egypt
## drives_right True
## Name: EG, dtype: object
The row data that’s generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:
# Adapt for loop
for lab, row in cars.iterrows() :
print(str(lab) + ': ' + str(row['cars_per_cap']))## US: 809
## AUS: 731
## JPN: 588
## IN: 18
## RU: 200
## MOR: 70
## EG: 45
5.13 Add column
In the video, Hugo showed you how to add the length of the country names of the brics DataFrame in a new column:
You can do similar things on the cars DataFrame.
# loop that adds COUNTRY column
for lab, row in cars.iterrows():
cars.loc[lab, 'COUNTRY'] = row['country'].upper()
# Print cars
print(cars)## cars_per_cap country drives_right COUNTRY
## US 809 United States True UNITED STATES
## AUS 731 Australia False AUSTRALIA
## JPN 588 Japan False JAPAN
## IN 18 India False INDIA
## RU 200 Russia True RUSSIA
## MOR 70 Morocco True MOROCCO
## EG 45 Egypt True EGYPT
Using iterrows() to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you’re creating a new Pandas Series.
If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you’ll want to use apply().
Compare the iterrows() version with the apply() version to get the same result in the brics DataFrame:
for lab, row in brics.iterrows() :
brics.loc[lab, "name_length"] = len(row["country"])
brics["name_length"] = brics["country"].apply(len)We can do a similar thing to call the upper() method on every name in the country column. However, upper() is a method, so we’ll need a slightly different approach:
## cars_per_cap country drives_right COUNTRY
## US 809 United States True UNITED STATES
## AUS 731 Australia False AUSTRALIA
## JPN 588 Japan False JAPAN
## IN 18 India False INDIA
## RU 200 Russia True RUSSIA
## MOR 70 Morocco True MOROCCO
## EG 45 Egypt True EGYPT
6 Case Study: Hacker Statistics
This chapter will allow you to apply all the concepts you’ve learned in this course. You will use hacker statistics to calculate your chances of winning a bet. Use random number generators, loops, and Matplotlib to gain a competitive edge!
6.1 Lecture: Random Numbers
6.2 Random float
Randomness has many uses in science, art, statistics, cryptography, gaming, gambling, and other fields. You’re going to use randomness to simulate a game.
All the functionality you need is contained in the random package, a sub-package of numpy. You’ll be using two functions from this package: - seed(): sets the random seed, so that your results are reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated. - rand(): if you don’t specify any arguments, it generates a random float between zero and one.
## 0.6964691855978616
Great! Now let’s simulate a dice.
6.3 Roll the dice
As Hugo explained in the video you can just as well use randint(), also a function of the random package, to generate integers randomly. The following call generates the integer 4, 5, 6 or 7 randomly. 8 is not included.
## 3
## 5
Alright! Time to actually start coding things up!
6.4 Determine your next move
In the Empire State Building bet, your next move depends on the number you get after throwing the dice. We can perfectly code this with an if-elif-else construct!
# Starting step
step = 50
# Roll the dice
dice = np.random.randint(1, 7)
if dice <= 2 :
step = step - 1
elif dice < 6 :
step = step + 1
else :
step = step + np.random.randint(1,7)
# Print out dice and step
print([dice, step])## [3, 51]
6.5 Lecture: Random Walk
6.6 The next step
Before, you have already written Python code that determines the next step based on the previous step. Now it’s time to put this code inside a for loop so that we can simulate a random walk.
# Initialize random_walk
random_walk = [0]
for x in range(100) :
# Set step: last element in random_walk
step = random_walk[-1]
# Roll the dice
dice = np.random.randint(1,7)
# Determine next step
if dice <= 2:
step = step - 1
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
# append next_step to random_walk
random_walk.append(step)
# Print random_walk
print(random_walk)## [0, -1, 0, 1, 2, 1, 0, -1, -2, -3, -4, -5, -6, -5, 0, -1, -2, -1, -2, -1, 0, 1, 2, 3, 2, 3, 2, 3, 4, 5, 6, 5, 9, 10, 9, 10, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 22, 27, 28, 32, 33, 32, 33, 34, 33, 34, 35, 37, 38, 39, 38, 37, 38, 39, 38, 37, 38, 39, 41, 40, 39, 40, 39, 40, 41, 42, 44, 43, 44, 45, 46, 47, 48, 47, 46, 47, 46, 47, 48, 47, 50, 51, 52, 53, 52, 53, 54, 58, 57, 56]
6.7 How low can you go?
Things are shaping up nicely! You already have code that calculates your location in the Empire State Building after 100 dice throws. However, there’s something we haven’t thought about - you can’t go below 0!
A typical way to solve problems like this is by using max(). If you pass max() two arguments, the biggest one gets returned. For example, to make sure that a variable x never goes below 10 when you decrease it, you can use:
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
# use max to make sure step can't go below 0
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
random_walk.append(step)
print(random_walk)## [0, 2, 1, 2, 4, 5, 6, 11, 10, 11, 12, 13, 14, 15, 14, 19, 20, 21, 22, 21, 20, 19, 18, 17, 18, 19, 20, 26, 25, 24, 23, 24, 25, 26, 25, 26, 27, 26, 31, 32, 31, 30, 29, 28, 29, 28, 27, 29, 30, 33, 34, 36, 37, 38, 39, 38, 37, 38, 39, 40, 41, 40, 41, 42, 43, 46, 47, 48, 47, 48, 47, 48, 49, 50, 54, 53, 52, 53, 54, 55, 54, 55, 54, 55, 57, 62, 61, 62, 63, 64, 65, 66, 67, 66, 67, 68, 69, 71, 73, 72, 73]
You’re not going below zero anymore. Great!
6.8 Visualize the walk
Let’s visualize this random walk! Remember how you could use matplotlib to build a line plot?
The first list you pass is mapped onto the x-axis and the second list is mapped onto the y-axis.
If you pass only one argument, Python will know what to do and will use the index of the list to map onto the x-axis, and the values in the list onto the y-axis.
# Initialization
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
random_walk.append(step)
# Plot random_walk
plt.plot(random_walk)
# Show the plot
plt.show()This is pretty cool! You can clearly see how your random walk progressed.
6.9 Lecture: Distribution
6.10 Simulate multiple walks
A single random walk is one thing, but that doesn’t tell you if you have a good chance at winning the bet. To get an idea about how big your chances are of reaching 60 steps, you can repeatedly simulate the random walk and collect the results.
# Initialize all_walks
all_walks = []
# Simulate random walk 10 times
for i in range(10) :
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
random_walk.append(step)
# Append random_walk to all_walks
all_walks.append(random_walk)
# Print all_walks
print(all_walks)## [[0, 1, 2, 3, 4, 5, 6, 7, 6, 7, 6, 5, 6, 5, 6, 5, 6, 7, 11, 12, 11, 17, 16, 15, 16, 15, 14, 15, 14, 18, 17, 18, 17, 18, 17, 18, 20, 19, 18, 17, 18, 17, 22, 23, 24, 23, 22, 23, 22, 23, 22, 27, 28, 27, 26, 25, 24, 25, 26, 30, 36, 37, 38, 39, 40, 39, 40, 42, 43, 44, 45, 44, 43, 44, 43, 44, 45, 46, 45, 46, 47, 48, 47, 46, 47, 48, 53, 54, 55, 60, 59, 60, 59, 60, 61, 62, 63, 62, 68, 67, 68], [0, 0, 0, 1, 5, 6, 7, 8, 9, 8, 7, 6, 5, 4, 5, 6, 7, 8, 9, 10, 9, 10, 11, 10, 11, 12, 15, 14, 15, 14, 15, 18, 19, 20, 21, 20, 19, 22, 23, 24, 25, 24, 23, 24, 27, 28, 33, 34, 33, 34, 33, 34, 33, 39, 38, 37, 38, 40, 39, 38, 37, 38, 39, 40, 41, 45, 50, 51, 52, 53, 56, 57, 58, 59, 60, 61, 62, 61, 60, 61, 62, 61, 67, 66, 67, 68, 67, 66, 67, 66, 65, 71, 70, 69, 70, 71, 70, 69, 68, 67, 68], [0, 6, 7, 10, 11, 17, 18, 19, 25, 24, 30, 29, 30, 31, 32, 31, 37, 38, 37, 38, 37, 38, 37, 38, 37, 38, 42, 43, 45, 44, 45, 44, 43, 44, 43, 44, 43, 47, 51, 50, 49, 48, 49, 50, 54, 55, 56, 60, 59, 58, 57, 58, 59, 61, 60, 59, 60, 61, 63, 66, 71, 72, 71, 72, 73, 74, 75, 76, 75, 76, 77, 83, 82, 87, 86, 90, 89, 93, 92, 95, 96, 95, 96, 102, 101, 100, 99, 103, 102, 101, 102, 103, 102, 103, 104, 105, 106, 105, 104, 103, 104], [0, 0, 0, 4, 5, 7, 11, 17, 16, 15, 16, 17, 18, 17, 18, 17, 18, 19, 18, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 33, 32, 35, 36, 35, 34, 35, 36, 37, 36, 35, 34, 33, 34, 35, 36, 37, 38, 39, 40, 39, 40, 41, 43, 42, 43, 44, 47, 49, 50, 49, 48, 47, 46, 45, 46, 45, 46, 48, 49, 50, 49, 50, 49, 48, 49, 48, 47, 46, 47, 46, 45, 46, 47, 48, 50, 51, 52, 51, 50, 51, 57, 56, 57, 58, 63, 62, 63, 62, 63, 64], [0, 0, 1, 2, 8, 9, 10, 11, 10, 12, 13, 14, 15, 14, 15, 16, 17, 18, 17, 18, 17, 18, 19, 18, 19, 23, 24, 27, 28, 32, 33, 32, 33, 34, 33, 32, 37, 38, 39, 38, 37, 38, 39, 40, 39, 43, 42, 43, 44, 45, 46, 47, 48, 49, 48, 47, 46, 47, 48, 52, 53, 52, 53, 54, 53, 59, 60, 61, 62, 61, 62, 63, 66, 65, 66, 65, 64, 63, 64, 65, 67, 68, 69, 73, 74, 73, 72, 73, 74, 73, 72, 73, 74, 75, 74, 73, 74, 75, 76, 75, 76], [0, 1, 0, 0, 0, 1, 2, 3, 4, 5, 10, 14, 13, 14, 13, 12, 11, 12, 11, 12, 13, 12, 16, 17, 16, 17, 16, 15, 16, 15, 19, 20, 21, 22, 23, 24, 23, 24, 25, 26, 27, 28, 27, 32, 33, 34, 33, 34, 33, 34, 35, 34, 35, 40, 41, 42, 41, 42, 43, 44, 43, 44, 43, 44, 45, 44, 43, 42, 43, 44, 43, 42, 41, 42, 46, 47, 48, 49, 50, 51, 50, 51, 52, 51, 52, 57, 58, 57, 56, 57, 56, 55, 54, 58, 59, 60, 61, 60, 61, 62, 63], [0, 1, 2, 1, 0, 3, 2, 1, 0, 0, 1, 7, 8, 7, 8, 9, 8, 7, 8, 9, 10, 9, 13, 14, 13, 15, 16, 15, 16, 17, 18, 19, 20, 21, 20, 19, 20, 21, 20, 21, 22, 21, 20, 19, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 31, 32, 33, 34, 35, 36, 35, 34, 40, 41, 42, 41, 40, 39, 43, 44, 48, 47, 53, 54, 55, 59, 60, 59, 58, 59, 60, 61, 62, 61, 67, 68, 67, 71, 72, 71, 72, 71, 77, 83, 84, 83, 84, 85, 86, 87, 88], [0, 0, 3, 2, 4, 5, 11, 10, 11, 12, 11, 10, 11, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 23, 24, 25, 26, 25, 24, 23, 24, 23, 27, 26, 25, 26, 28, 29, 34, 33, 34, 35, 39, 38, 39, 40, 39, 38, 39, 40, 41, 40, 39, 38, 39, 38, 37, 38, 37, 36, 35, 36, 37, 36, 35, 34, 35, 36, 37, 36, 35, 36, 37, 38, 39, 38, 39, 38, 39, 40, 41, 42, 43, 48, 53, 52, 53, 54, 53, 54, 60, 59, 60, 59, 60, 59], [0, 1, 2, 3, 2, 1, 2, 3, 4, 3, 2, 1, 3, 4, 5, 4, 3, 2, 3, 4, 5, 4, 3, 4, 7, 12, 15, 16, 17, 23, 24, 25, 26, 25, 27, 32, 33, 34, 35, 36, 37, 38, 37, 38, 39, 40, 41, 42, 44, 48, 49, 50, 51, 52, 56, 61, 60, 59, 58, 57, 60, 61, 62, 63, 62, 61, 64, 65, 64, 63, 62, 63, 64, 65, 66, 65, 66, 65, 66, 67, 66, 67, 68, 69, 70, 71, 72, 73, 72, 71, 72, 73, 76, 77, 76, 75, 76, 77, 78, 83, 82], [0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 4, 3, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 21, 22, 23, 24, 25, 26, 25, 24, 23, 24, 25, 26, 27, 29, 30, 31, 32, 34, 38, 37, 36, 35, 34, 35, 36, 37, 36, 35, 34, 33, 32, 31, 32, 36, 40, 41, 42, 41, 40, 41, 42, 43, 49, 50, 49, 48, 49, 48, 49, 48, 49, 50, 49, 50, 49, 48, 49, 50, 49, 50, 49, 50, 53, 54, 55, 56, 57, 56, 57, 58, 63, 62, 63, 64, 65]]
6.11 Visualize all walks
all_walks is a list of lists: every sub-list represents a single random walk. If you convert this list of lists to a NumPy array, you can start making interesting plots!
# initialize and populate all_walks
all_walks = []
for i in range(10) :
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
random_walk.append(step)
all_walks.append(random_walk)
# Convert all_walks to NumPy array: np_aw
np_aw = np.array(all_walks)
# Plot np_aw and show
plt.plot(np_aw)
plt.show()# Clear the figure
plt.clf()
# Transpose np_aw: np_aw_t
np_aw_t = np.transpose(np_aw)
# Plot np_aw_t and show
plt.plot(np_aw_t)
plt.show()Good job! You can clearly see how the different simulations of the random walk went. Transposing the 2D NumPy array was crucial; otherwise Python misunderstood.
6.12 Implement clumsiness
There’s still something we forgot! You’re a bit clumsy and you have a 0.5% chance of falling down. That calls for another random number generation. Basically, you can generate a random float between 0 and 1. If this value is less than or equal to 0.005, you should reset step to 0.
# Simulate random walk 250 times
all_walks = []
for i in range(250) :
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
# Implement clumsiness
if np.random.rand() <= 0.005:
step = 0
random_walk.append(step)
all_walks.append(random_walk)
# Create and plot np_aw_t
np_aw_t = np.transpose(np.array(all_walks))
plt.plot(np_aw_t)
plt.show()Superb! Look at the plot. In some of the simulations you’re indeed taking a deep dive down!
6.13 Plot the distribution
All these fancy visualizations have put us on a sidetrack. We still have to solve the million-dollar problem: What are the odds that you’ll reach 60 steps high on the Empire State Building?
Basically, you want to know about the end points of all the random walks you’ve simulated. These end points have a certain distribution that you can visualize with a histogram.
# Simulate random walk 500 times
all_walks = []
for i in range(500) :
random_walk = [0]
for x in range(100) :
step = random_walk[-1]
dice = np.random.randint(1,7)
if dice <= 2:
step = max(0, step - 1)
elif dice <= 5:
step = step + 1
else:
step = step + np.random.randint(1,7)
if np.random.rand() <= 0.001 :
step = 0
random_walk.append(step)
all_walks.append(random_walk)
# Create and plot np_aw_t
np_aw_t = np.transpose(np.array(all_walks))
# Select last row from np_aw_t: ends
ends = np_aw_t[-1,:]
# Plot histogram of ends, display plot
plt.hist(ends)
plt.show()Great job! Have a look at a histogram; what do you think your chances are?
7 Final Words
Congratulations on completing the course! More courses, tracks and instructions can be found here. Happy learning!